Search CORE

23 research outputs found

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

ArrayBridge: Interweaving declarative array processing with high-performance computing

Author: Blanas Spyros
Brown Paul
Byna Suren
Floratos Sofoklis
Prabhat
Wu Kesheng
Xing Haoyuan
Publication venue
Publication date: 01/01/2017
Field of study

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.Comment: 12 pages, 13 figure

arXiv.org e-Print Archive

eScholarship - University of California

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros,
Publication venue
Publication date: 01/12/2018
Field of study

Ezid

Contention-Based Performance Evaluation of Multidimensional Range Search in Peer-to-peer Networks

Author: Spyros Blanas
Vasilis Samoladas
Publication venue: 'European Alliance for Innovation n.o.'
Publication date: 01/01/2007
Field of study

Performance evaluation of peer-to-peer search techniques has been based on simple performance metrics, such as mes-sage hop counts and total network traffic, mostly disre-garding their inherent concurrent nature, where contention may arise. This paper is concerned with the effect of con-tention in complex P2P network search, focusing on tech-niques for multidimensional range search. We evaluate peer-to-peer networks derived from recently proposed works, in-troducing two novel metrics related to concurrency and con-tention, namely responsiveness and throughput. Our results highlight the impact of contention on these networks, and demonstrate that some studied networks do not scale in the presence of contention. Also, our results indicate that cer-tain network properties believed to be desirable (e.g. uni-form data distribution or peer accesses) may not be as crit-ical as previously believed

CiteSeerX

Crossref